perm filename COMP1.WRU[206,JMC] blob
sn#070522 filedate 1973-11-07 generic text, type T, neo UTF8
COMPUTER SCIENCE DEPARTMENT
STANFORD UNIVERSITY
CS 206 COMPUTING WITH SYMBOLIC EXPRESSIONS FALL 1973
ADDITIONAL INFORMATION ABOUT LISP COMPILING
1. Some facts about the PDP-10.
The following facts about the PDP-10 may be of use.
The PDP-10 has a 36 bit word and an 18 bit address. In
instructions and in accumulators used as index registers this is the
right part of the word where the least significant bits in arithmetic
reside.
There are 16 general registers which serve simultaneously as
accumulators (receiving the results of arithmetic operations), index
registers (modifying the nominal addresses of instructions to form
effective addresses), and as the first 16 registers of memory (if the
effective address of an instruction is less than 16, then the
instruction uses the corresponding general register as its operand.
All instructions have the same format and are written for the
LAP assembly program in the form
(<op name> <accumulator> <address> <index register>).
Thus %(MOVE 1 3 P)% causes accumulator %1% to receive the contents of
a memory register whose address is %3+c(P), i.e. 3+<the contents of
general register P. If the op name is followed by %@,then the memory
reference is indirect, i.e. the effective addrress is the contents of
the right half of the word directly addressed by the instruction
(modified by the index register and indirect bit of that word). In
the following description of some instructions useful in constructing
the compiler, <ef> denotes the effective address of an instruction.
MOVE c(ac) ← c(<ef>)
MOVEI c(ac) ← <ef>
MOVEM c(<ef>) ← c(ac)
HLRZ c(left half ac) ← right half of c(<ef>)
HRRZ c(right half ac) ←c(right half of c(<ef>)
(These two are used indirectly for %car% and %cdr% respectively.)
ADD c(ac) ← c(ac) + c(<ef>)
SUB c(ac) ← c(ac) - c(<ef>)
JRST go to <ef>
JUMPE if c(ac) = 0 then go to <ef>
JUMPN if c(ac) ≠ 0 then go to <ef>
CAME if c(ac) = c(<ef>) then <skip next
CAMN if c(ac) ≠ 0 then <skip next instruction>
PUSH c(c(right half of ac)) ← c(<ef)); the contents
of each half of ac is increased by one and if
the contents of the left half is then %0, a stack
overflow interrupt occurs. (PUSH P ac) is
is used in LISP to put c(ac) on the
stack.
POP c(<ef>) ←c(c(right half of ac)); the
contents of each half of ac are then decreased
by %1. This may be used for removing the
top element of the stack. Thus (POP P 1) puts
the top element of the stack in accumulator 1.
The i-th element of the stack is obtained by
(MOVE@ 1 i P).
POPJ (POPJ P) is used for returning from a subroutine
These instructions are adequate for compiling basic LISP code
with the addition of the subroutine calling pseudo-instrucion. (CALL
n (E <subr)) is used for calling the LISP subroutine <subr> with %n%
arguments. The convention is that the arguments will be stored in
successive accumulators beginning with accumulator %1, and the result
will be reurned in accumulator %1. In particular the functions
%ATOM% and %CONS% are called with %(CALL 1 (E ATOM))% and (CALL 2 (E
CONS)) respectively, and programs produced by you compiler will be
called similarly when their names are referred to.
2. Code produced by LISP compilers.
I have written two compilers, the simple one called LCOM0
and more optimising compiler called LCOM4.
Currently, LCOM4 produces about half as
many instructions for a given function as does LCOM0. Besides these,
there is the standard PDP-10 LISP compiler written at M.I.T. by
Richard Greenblatt and Stuart Nelson and modified by Whitfield
Diffie.
Consider the function %union% for giving the union of two
unordered lists. We have
union(u,v) ← if n u then v else if a u ε v then
union(d u,v) else a u . union(d u,v).
Its LISP definition is
(DE UNION (U V) (COND ((NULL U) V) ((MEMBER (CAR U) V)
(UNION (CDR U) V)) (T (CONS (CAR U) (UNION (CDR U) V)))))
Here are the programs produced by the three compilers.
The basic compiler LCOM0 generates the following 46
instructions:
(LAP UNION SUBR)
(PUSH P 1)
(PUSH P 2)
(MOVE 1 -1 P)
(JUMPN 1 G0157)
(MOVE 1 0 P)
(JRST G0156)
G0157
(MOVE 1 -1 P)
(HLRZ@ 1 1)
(PUSH P 1)
(MOVE 1 -1 P)
(PUSH P 1)
(SUB P (C 0 0 2 2))
(MOVE 2 2 P)
(MOVE 1 1 P)
(CALL 2 (E MEMBER))
(JUMPE 1 G0158)
(MOVE 1 -1 P)
(HRRZ@ 1 1)
(PUSH P 1)
(MOVE 1 -1 P)
(PUSH P 1)
(SUB P (C 0 0 2 2))
(MOVE 2 2 P)
(MOVE 1 1 P)
(CALL 2 (E UNION))
(JRST G0156)
G0158
(MOVE 1 -1 P)
(HLRZ@ 1 1)
(PUSH P 1)
(MOVE 1 -2 P)
(HRRZ@ 1 1)
(PUSH P 1)
(MOVE 1 -2 P)
(PUSH P 1)
(SUB P (C 0 0 2 2))
(MOVE 2 2 P)
(MOVE 1 1 P)
(CALL 2 (E UNION))
(PUSH P 1)
(SUB P (C 0 0 2 2))
(MOVE 2 2 P)
(MOVE 1 1 P)
(CALL 2 (E CONS))
(JRST G0156)
G0159
G0156
(SUB P (C 0 0 2 2))
(POPJ P)
NIL
LCOM4 produces the following program:
(LAP UNION SUBR)
(PUSH P 1)
(PUSH P 2)
(MOVE 1 -1 P)
(JUMPN 1 G0157)
(MOVE 1 0 P)
(JRST 0 G0156)
G0157
(HLRZ@ 1 -1 P)
(MOVE 2 0 P)
(CALL 2 (E MEMBER))
(JUMPE 1 G0158)
(HRRZ@ 1 -1 P)
(MOVE 2 0 P)
(CALL 2 (E UNION))
(JRST 0 G0156)
G0158
(HRRZ@ 1 -1 P)
(MOVE 2 0 P)
(CALL 2 (E UNION))
(MOVE 2 1)
(HLRZ@ 1 -1 P)
(CALL 2 (E CONS))
G0156
(SUB P (C 0 0 2 2))
(POPJ P)
NIL
The standard PDP-10 LISP compiler produces the following
code:
(LAP UNION SUBR)
(PUSH P 1)
(PUSH P 2)
(JUMPN 1 G0002)
(MOVE 1 2)
(JRST 0 G0001)
G0002 (HLRZ@ 1 1)
(CALL 2 (E MEMBER))
(JUMPE 1 G0003)
(MOVE 2 0 P)
(HRRZ@ 1 -1 P)
(CALL 2 (E UNION))
(JRST 0 G0001)
G0003 (HLRZ@ 1 -1 P)
(MOVE 2 0 P)
(PUSH P 1)
(HRRZ@ 1 -2 P)
(CALL 2 (E UNION))
(POP P 2)
(CALL 2 (E XCONS))
G0008
G0001 (SUB P (C 0 0 2 2))
(POPJ P)
NIL
Note that all three compilers produce the same first line of
heading. This is necessary for the LAP assembly program which also
requires the %NIL% at the end as punctuation. The standard compiler
uses a function called %XCONS% that has the same effect as CONS%
except that it receives its arguments in the reverse order which is
sometimes convenient. This requires, of course, that the compiler be
smart enough to decide where it is more convenient to put the
arguments of the %cons% function.
My two compilers are similar in structure, but the better
one, which is twice as long, uses the following optimisations.
1. When the argument of CAR or CDR is a variable, it compiles
a (HLRZ@ 1 i P) or (HRRZ@ 1 i P) which gets the result through the
stack without first compiling the argument into an accumulator.
2. When it has to set up the arguments of a function in the
accumulators, in general, the program must compute the arguments one
at a time and save them on the stack, and then load the accumulators
from the stack. However, if one of the arguments is a variable, is a
quoted expression, or can be obtained from a variable by a chain of
%cars% and %cdrs, then it need not be computed until the time of
loading accumulators since it can be computed using only the
accumulator in which it is wanted.
3. The Diffy compiler produces about 10 percent less code
than LCOM4; the main difference seems to be that it sometimes notices
when it has an expression that it will need later. Sometimes this
feature leads it astray as in the above example wherein saves %a%u%
for later use at greater cost than re-computing it.
It should further be noted that quoted S-expressions can be
compiled with the instruction
(MOVEI 1 (QUOTE α)).
3. Running the compilers.
If you want to run one of these compilers, say LCOM4, to see
what sort of code it produces, here is the procedure:
1. Prepare a file <file name> without an extension containing
the functions to be compiled in LISP S-expression form, i.e. (DE
<function name> <variable list> <function body>).
2. Give the system command RU LCOM4[206,JMC]<cr>. LCOM4 will
reply with *.
3. Give the RLISP command COMPL <file name>;<cr>. COMPL is
an FEXPR so that <file name> does not have to be preceded by @.
LCOM4 will respond with a list of the functions compiled and the
lengths of the compiled programs exaggerating slightly since all
expressions in the output list are counted. <control C> will get you
back to the system and a file called <file name>.LAP has the compiled
programs.
4. You can look at the program by TYPE <file name>.LAP<cr>,
and if you want to try out the functions you can enter LISP by R
LISP<cr> and after answering N to the ALLOC? question, you can read
in the functions by (DSKIN (<file name>.LAP))<cr>, and they can be
tested in the usual LISP way.